BY Dhruv Parthasarathy
(Director of AI Programs @ Udacity)

eigenvector-760f058e-5ca2-4570-9378-9fd8d9fa669b

E i g e n V e c t o r s E i g e n V e c t o r s EigenVectorsEigen\,VectorsEigenVectors


In the last post, we developed an intuition for matrices. We found that they are just compact representations of linear maps and that adding and multiplying matrices are just ways of combining the underlying linear maps.
In this post, we're going to dive deeper into the world of linear algebra and cover eigenvectors. Eigenvectors are central to Linear Algebra and help us understand many interesting properties of linear maps including:
  1. The effect of applying the linear map repeatedly on an input.
  2. How the linear map rotates the space. In fact eigenvectors were first derived to study the axis of rotation of planets!

Eigenvectors helped early mathematicians study how the planets rotate. Image Source: Wikipedia.

For a more modern example, eigenvectors are at the heart of one of the most important algorithms of all time - the original Page Rank algorithm that powers Google Search.

Our Goals

In this post we're going to try and derive eigenvectors ourselves. To really create a strong motivation, we're going to explore basis vectors, matrices in different bases, and matrix diagonalization. So hang in there and wait for the big reveal - I promise it will be really exciting when it all comes together!
Everything we'll be doing is going to be in the 2D space R 2 R 2 R^(2)R^2R2 - the standard coordinate plane over real numbers you're probably already used to.

Basis Vectors

We saw in the last post how we can derive the matrix for a given linear map f f fff:
f ( x ) f ( x ) f(x)f(x)f(x) (as we defined it in the previous section) can be represented by the notation
[ f ( [ 1 0 ] ) f ( [ 0 1 ] ) ] f ( 1 0 ) f ( 0 1 ) [[f([[1],[0]]),f([[0],[1]])]]\begin{bmatrix} f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) & f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) \end{bmatrix}[f([10])f([01])]
= [ 3 0 0 5 ] = 3 0 0 5 =[[3,0],[0,5]]= \begin{bmatrix} \textcolor{blue}{3} & \textcolor{#228B22}{0} \\ \textcolor{blue}{0} & \textcolor{#228B22}{5} \end{bmatrix}=[3005]
This is extremely cool - we can describe the entire function and how it operates on an infinite number of points by a little 4 value table.
But why did we choose [ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 0 1 ] 0 1 [[0],[1]]\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01] to define the columns of the matrix? Why not some other pair like [ 3 3 ] 3 3 [[3],[3]]\textcolor{blue}{\begin{bmatrix} 3 \\ 3 \end{bmatrix}}[33] and [ 0 0 ] 0 0 [[0],[0]]\textcolor{#228B22}{\begin{bmatrix} 0 \\ 0 \end{bmatrix}}[00]?
Intuitively, we think of [ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 0 1 ] 0 1 [[0],[1]]\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01] as units that we can use to create other vectors. In fact, we can break down every vector in R 2 R 2 R^(2)R^2R2 into some combination of these two vectors.
Embedded Image
We can reach any point in the coordinate plan by combining our two vectors.
More formally, when two vectors are able to combine in different ways to create all other vectors in R 2 R 2 R^(2)R^2R2, we say that those vectors s p a n s p a n spanspanspan the space. The minimum number of vectors you need to span R 2 R 2 R^(2)R^2R2 is 2. So when we have 2 vectors that span R 2 R 2 R^(2)R^2R2, we call those vectors a basis.
[ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 0 1 ] 0 1 [[0],[1]]\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01] are basis vectors for R 2 R 2 R^(2)R^2R2.
You can think of basis vectors as the minimal building blocks for the space. We can combine them in different amounts to reach all vectors we could care about.

We can think of basis vectors as the building blocks of the space - we can combine them to create all possible vectors in the space. Image Source: instructables.com.


Other Basis Vectors for R 2 R 2 R^(2)R^2R2

Now are there other pairs of vectors that also form a basis for R 2 R 2 R^(2)R^2R2?
Let's start with an example that definitely won't work.

Bad Example

[ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 1 0 ] 1 0 [[-1],[0]]\textcolor{#228B22}{\begin{bmatrix} -1 \\ 0 \end{bmatrix}}[10].
Can you combine these vectors to create [ 2 3 ] 2 3 [[2],[3]]{\begin{bmatrix} 2 \\ 3 \end{bmatrix}}[23]? Clearly you can't - we don't have any way to move in the y y yyy direction.
Embedded Image
No combination of these two vectors could possible get us the vector P P PPP.

Good Example

What about [ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 1 1 ] 1 1 [[1],[1]]\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}[11]?
Embedded Image
Our new basis vectors.
Surprisingly, you can! The below image shows how we can reach our previously unreachable point P P PPP.
Embedded Image
Note we can can combine 3 3 333 units of [ 1 1 ] 1 1 [[1],[1]]{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}[11] and 1 1 -1-11 units of [ 1 0 ] 1 0 [[1],[0]]{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] to get us the vector P P PPP.
I'll leave a simple proof of this as an appendix at the end of this post so we can keep moving - but it's not too complicated so if you're up for it, give it a go! The main thing we've learned here is that:
There are multiple valid bases for R 2 R 2 R^(2)R^2R2.

Bases as New Coordinate Axes

In many ways, choosing a new basis is like choosing a new set of axes for the coordinate plane. When we when we switch our basis to say B = { [ 1 0 ] , [ 1 1 ] } B = { 1 0 , 1 1 } B={[[1],[0]],[[1],[1]]}B = \{\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}, \textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}\}B={[10],[11]}, our axes just rotate as shown below:
Embedded Image
As our second basis vector changed from [ 0 1 ] 0 1 [[0],[1]]\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01] to [ 1 1 ] 1 1 [[1],[1]]\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}[11], our y axis rotates to be in line with [ 1 1 ] 1 1 [[1],[1]]\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}[11].

As a result of this, the same notation for a vector means different things in different bases.
In the original basis, [ 3 4 ] 3 4 [[3],[4]]{\begin{bmatrix} \textcolor{blue}{3} \\ \textcolor{#228B22}{4} \end{bmatrix}}[34] meant:
  • The vector you get when you compute 3 [ 1 0 ] + 4 [ 0 1 ] 3 1 0 + 4 0 1 3*[[1],[0]]+4*[[0],[1]]\textcolor{blue}{3 \cdot \begin{bmatrix} 1 \\ 0 \end{bmatrix}} + \textcolor{#228B22}{4 \cdot \begin{bmatrix} 0 \\ 1 \end{bmatrix}}3[10]+4[01].
  • Or just 3 3 3*\textcolor{blue}{3} \cdot3 first basis vector plus 4 4 4*\textcolor{#228B22}{4} \cdot4 second basis vector.
Embedded Image
In our usual notation, [ 3 4 ] 3 4 [[3],[4]]{\begin{bmatrix} \textcolor{blue}{3} \\ \textcolor{#228B22}{4} \end{bmatrix}}[34] means 3 3 333 units of [ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and 4 4 444 units of [ 0 1 ] 0 1 [[0],[1]]\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01]

Now when we use a different basis , the meaning of this notation actually changes.
For the basis is B = { [ 1 0 ] , [ 1 1 ] } B = { 1 0 , 1 1 } B={[[1],[0]],[[1],[1]]}B = \{\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}, \textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}\}B={[10],[11]}, the vector [ 3 4 ] B 3 4 B [[3],[4]]_(B)\begin{bmatrix} \textcolor{blue}{3} \\ \textcolor{#228B22}{4} \end{bmatrix}_{B}[34]B means:
  • The vector you get from: 3 [ 1 0 ] + 4 [ 1 1 ] 3 1 0 + 4 1 1 3*[[1],[0]]+4*[[1],[1]]\textcolor{blue}{3 \cdot \begin{bmatrix} 1 \\ 0 \end{bmatrix}} + \textcolor{#228B22}{4 \cdot \begin{bmatrix} 1 \\ 1 \end{bmatrix}}3[10]+4[11].
You can see this change below:
Embedded Image
In the notation of basis B B BBB, [ 3 4 ] B 3 4 B [[3],[4]]_(B){\begin{bmatrix} \textcolor{blue}{3} \\ \textcolor{#228B22}{4} \end{bmatrix}}_{B}[34]B means 3 3 333 units of [ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and 4 4 444 units of [ 1 1 ] 1 1 [[1],[1]]\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}[11] giving us point P B P B P_(B)P_{B}PB.
By changing the underlying axes, we changed the location of P P PPP even though it's still called ( 3 , 4 ) ( 3 , 4 ) (3,4)(3, 4)(3,4). You can see this below:
Embedded Image
The point P P PPP also changes position when we change the basis. It is still 3 3 333 parts first basis vector, 4 4 444 parts second basis vector. But since the underlying basis vectors have changed, it also changes.
So the vectors [ 3 4 ] 3 4 [[3],[4]]{\begin{bmatrix} \textcolor{blue}{3} \\ \textcolor{#228B22}{4} \end{bmatrix}}[34] and [ 3 4 ] B 3 4 B [[3],[4]]_(B){\begin{bmatrix} \textcolor{blue}{3} \\ \textcolor{#228B22}{4} \end{bmatrix}}_{B}[34]B refer to different actual vectors based on basis B B BBB.

Matrix Notation Based on Bases

Similarly the same notation also means different things for matrices based on the basis. Earlier, the matrix F F FFF for the function f f fff was represented by:
F = [ f ( [ 1 0 ] ) f ( [ 0 1 ] ) ] F = f ( 1 0 ) f ( 0 1 ) F=[[f([[1],[0]]),f([[0],[1]])]]F = \begin{bmatrix} f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) & f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) \end{bmatrix}F=[f([10])f([01])]
When I use the basis B = { [ 1 0 ] , [ 1 1 ] } B = { 1 0 , 1 1 } B={[[1],[0]],[[1],[1]]}B = \{\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}, \textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}\}B={[10],[11]}, the matrix F B F B F_(B)F_{B}FB in basis B B BBB becomes:
F B = [ f ( [ 1 0 ] ) B f ( [ 1 1 ] ) B ] F B = f ( 1 0 ) B f ( 1 1 ) B F_(B)=[[f([[1],[0]])_(B),f([[1],[1]])_(B)]]F_{B} = \begin{bmatrix} f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}})_{B} & f(\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}})_{B} \end{bmatrix}FB=[f([10])Bf([11])B]
More generally, for a basis B = { b 1 , b 2 } B = { b 1 , b 2 } B={b_(1),b_(2)}B = \{b_1, b_2\}B={b1,b2}, the matrix is:
F B = [ f ( b 1 ) B f ( b 2 ) B ] F B = f ( b 1 ) B f ( b 2 ) B F_(B)=[[f(b_(1))_(B),f(b_(2))_(B)]]F_{B} = \begin{bmatrix} f(\textcolor{blue}{b_1})_{B} & f(\textcolor{#228B22}{b_2})_{B} \end{bmatrix}FB=[f(b1)Bf(b2)B]

The Power of Diagonals

We took this short detour into notation for a very specific reason - rewriting a matrix in a different basis is actually a neat trick that allows us to reconfigure the matrix to make it easier to use. How? Let's find out with a quick example.
Let's say I have a matrix F F FFF (representing a linear function) that I need to apply again and again (say 5 times) on a vector v v vvv.
This would be:
F F F F F v F F F F F v F*F*F*F*F*vF \cdot F \cdot F \cdot F \cdot F \cdot vFFFFFv.
Usually, calculating this is really cumbersome.

Can you imagine doing this 5 times in a row? Yeesh. Image Source: Wikipedia.

But let's imagine for a moment that F F FFF was a diagonal matrix (i.e. something like F = [ a 0 0 b ] F = a 0 0 b F=[[a,0],[0,b]]F = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}F=[a00b]. If this were the case, then this multiplication would be EASY.
Why? Let's see what F F F F F*FF \cdot FFF is:
F F = [ a 0 0 b ] [ a 0 0 b ] F F = a 0 0 b a 0 0 b F*F=[[a,0],[0,b]]*[[a,0],[0,b]]F \cdot F = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix} \cdot \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}FF=[a00b][a00b]
F F = [ a a + 0 0 a 0 + 0 b 0 a + b 0 0 0 + b b ] F F = a a + 0 0 a 0 + 0 b 0 a + b 0 0 0 + b b F*F=[[a*a+0*0,a*0+0*b],[0*a+b*0,0*0+b*b]]F \cdot F = \begin{bmatrix} a \cdot a + 0 \cdot 0 & a \cdot 0 + 0 \cdot b \\ 0 \cdot a + b \cdot 0 & 0 \cdot 0 + b \cdot b \end{bmatrix}FF=[aa+00a0+0b0a+b000+bb]
F F = [ a 2 0 0 b 2 ] F F = a 2 0 0 b 2 F*F=[[a^(2),0],[0,b^(2)]]F \cdot F = \begin{bmatrix} a^2 & 0 \\ 0 & b^2 \end{bmatrix}FF=[a200b2]

More generally,

F n = [ a n 0 0 b n ] F n = a n 0 0 b n F^(n)=[[a^(n),0],[0,b^(n)]]F^{n} = \begin{bmatrix} a^n & 0 \\ 0 & b^n \end{bmatrix}Fn=[an00bn]
This is way easier to work with!
So how can we get F F FFF to be a diagonal matrix?


Which Basis makes a Matrix Diagonal?

Earlier, we saw that choosing a new basis makes us change how we write down the matrix. So can we find a basis B = { b 1 , b 2 } B = { b 1 , b 2 } B={b_(1),b_(2)}B = \{b_1, b_2\}B={b1,b2} that converts F F FFF into a diagonal matrix?
From earlier, we know that F B F B F_(B)F_{B}FB, the matrix F F FFF in the basis B B BBB, is written as:
F B = [ f ( b 1 ) B f ( b 2 ) B ] F B = f ( b 1 ) B f ( b 2 ) B F_(B)=[[f(b_(1))_(B),f(b_(2))_(B)]]F_B = \begin{bmatrix} f(\textcolor{blue}{b_1})_{B} & f(\textcolor{#228B22}{b_2})_{B} \end{bmatrix}FB=[f(b1)Bf(b2)B]
For this to be diagonal, we must have:
F B = [ f ( b 1 ) B f ( b 2 ) B ] = [ λ 1 0 0 λ 2 ] B F B = f ( b 1 ) B f ( b 2 ) B = λ 1 0 0 λ 2 B F_(B)=[[f(b_(1))_(B),f(b_(2))_(B)]]=[[lambda_(1),0],[0,lambda_(2)]]_(B)F_B = \begin{bmatrix} f(\textcolor{blue}{b_1})_{B} & f(\textcolor{#228B22}{b_2})_{B} \end{bmatrix} = {\begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}}_{B}FB=[f(b1)Bf(b2)B]=[λ100λ2]B
for some λ 1 λ 1 lambda_(1)\lambda_1λ1 and λ 2 λ 2 lambda_(2)\lambda_2λ2 (i.e. the the top-right and bottom-left elements are 0 0 000).
This implies:
  1. f ( b 1 ) B = [ λ 1 0 ] B f ( b 1 ) B = λ 1 0 B f(b_(1))_(B)=[[lambda_(1)],[0]]_(B)f(\textcolor{blue}{b_1})_{B} = {\begin{bmatrix}\lambda_1 \\ 0 \end{bmatrix}}_{B}f(b1)B=[λ10]B.
  2. f ( b 2 ) B = [ 0 λ 2 ] B f ( b 2 ) B = 0 λ 2 B f(b_(2))_(B)=[[0],[lambda_(2)]]_(B)f(\textcolor{#228B22}{b_2})_{B} = {\begin{bmatrix}0 \\ \lambda_2 \end{bmatrix}}_{B}f(b2)B=[0λ2]B.
Recall our discussion on vector notation in a different basis:
Say my basis is B = { [ 1 0 ] , [ 1 1 ] } B = { 1 0 , 1 1 } B={[[1],[0]],[[1],[1]]}B = \{\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}, \textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}\}B={[10],[11]}.
Then the vector [ 3 4 ] B 3 4 B [[3],[4]]_(B)\begin{bmatrix} \textcolor{blue}{3} \\ \textcolor{#228B22}{4} \end{bmatrix}_{B}[34]B means:
  • The vector you get when you compute: 3 [ 1 0 ] + 4 [ 1 1 ] 3 1 0 + 4 1 1 3*[[1],[0]]+4*[[1],[1]]\textcolor{blue}{3 \cdot \begin{bmatrix} 1 \\ 0 \end{bmatrix}} + \textcolor{#228B22}{4 \cdot \begin{bmatrix} 1 \\ 1 \end{bmatrix}}3[10]+4[11].
So, we know the following additional information:
f ( b 1 ) = [ λ 1 0 ] B = λ 1 b 1 + 0 b 2 f ( b 1 ) = λ 1 0 B = λ 1 b 1 + 0 b 2 f(b_(1))=[[lambda_(1)],[0]]_(B)=lambda_(1)*b_(1)+0*b_(2)f(\textcolor{blue}{b_1}) = {\begin{bmatrix}\lambda_1 \\ 0 \end{bmatrix}}_B = \lambda_1 \cdot \textcolor{blue}{b_1} + 0 \cdot \textcolor{#228B22}{b_2}f(b1)=[λ10]B=λ1b1+0b2

f ( b 1 ) = λ 1 b 1 f ( b 1 ) = λ 1 b 1 f(b_(1))=lambda_(1)*b_(1)f(\textcolor{blue}{b_1}) = \mathbf{\lambda_1 \cdot \textcolor{blue}{b_1}}f(b1)=λ1b1
Similarly,
f ( b 2 ) = [ 0 λ 2 ] B = 0 b 1 + λ 2 b 2 f ( b 2 ) = 0 λ 2 B = 0 b 1 + λ 2 b 2 f(b_(2))=[[0],[lambda_(2)]]_(B)=0*b_(1)+lambda_(2)*b_(2)f(\textcolor{#228B22}{b_2}) = {\begin{bmatrix}0 \\ \lambda_2 \end{bmatrix}}_B = 0 \cdot \textcolor{blue}{b_1} + \lambda_2 \cdot \textcolor{#228B22}{b_2}f(b2)=[0λ2]B=0b1+λ2b2

f ( b 2 ) = λ 2 b 2 f ( b 2 ) = λ 2 b 2 f(b_(2))=lambda_(2)*b_(2)f(\textcolor{#228B22}{b_2}) = \mathbf{\lambda_2 \cdot \textcolor{#228B22}{b_2}}f(b2)=λ2b2

Seeing this Visually

What do these vectors look like on our coordinate axis?
We saw earlier that choosing a new basis B = { b 1 , b 2 } B = { b 1 , b 2 } B={b_(1),b_(2)}B = \{b_1, b_2\}B={b1,b2} creates a new coordinate axis for R 2 R 2 R^(2)R^2R2 like below:
Embedded Image
A new basis B = { b 1 , b 2 } B = { b 1 , b 2 } B={b_(1),b_(2)}B = \{b_1, b_2\}B={b1,b2} gives us new coordinate axis.
Let's plot f ( b 1 ) B = [ λ 1 0 ] B f ( b 1 ) B = λ 1 0 B f(b_(1))_(B)=[[lambda_(1)],[0]]_(B)f(\textcolor{blue}{b_1})_{B} = {\begin{bmatrix}\lambda_1 \\ 0 \end{bmatrix}}_{B}f(b1)B=[λ10]B:
Embedded Image
In the graph above, we can see that [ λ 1 0 ] B = λ 1 b 1 λ 1 0 B = λ 1 b 1 [[lambda_(1)],[0]]_(B)=lambda_(1)b_(1){\begin{bmatrix}\lambda_1 \\ 0 \end{bmatrix}}_{B} = \lambda_1 b_1[λ10]B=λ1b1, so
f ( b 1 ) B = λ 1 b 1 f ( b 1 ) B = λ 1 b 1 f(b_(1))_(B)=lambda_(1)b_(1)f(\textcolor{blue}{b_1})_{B} = \lambda_1 b_1f(b1)B=λ1b1

Similarly, let's plot f ( b 2 ) B = [ 0 λ 2 ] B f ( b 2 ) B = 0 λ 2 B f(b_(2))_(B)=[[0],[lambda_(2)]]_(B)f(\textcolor{#228B22}{b_2})_{B} = {\begin{bmatrix}0 \\ \lambda_2 \end{bmatrix}}_{B}f(b2)B=[0λ2]B:
Embedded Image
From the above, we see clearly that [ 0 λ 2 ] B = λ 2 b 2 0 λ 2 B = λ 2 b 2 [[0],[lambda_(2)]]_(B)=lambda_(2)b_(2){\begin{bmatrix}0 \\ \lambda_2 \end{bmatrix}}_{B} = \lambda_2 b_2[0λ2]B=λ2b2, so
f ( b 2 ) B = λ 2 b 2 f ( b 2 ) B = λ 2 b 2 f(b_(2))_(B)=lambda_(2)b_(2)f(\textcolor{blue}{b_2})_{B} = \lambda_2 b_2f(b2)B=λ2b2

Rules For Getting a Diagonal

So if we can find a basis B B BBB formed by b 1 b 1 b_(1)b_1b1 and b 2 b 2 b_(2)b_2b2 such that:
  • f ( b 1 ) = λ 1 b 1 f ( b 1 ) = λ 1 b 1 f(b_(1))=lambda_(1)b_(1)f(\textcolor{blue}{b_1}) = \lambda_1 \textcolor{blue}{b_1}f(b1)=λ1b1 and
  • f ( b 2 ) = λ 2 b 2 f ( b 2 ) = λ 2 b 2 f(b_(2))=lambda_(2)b_(2)f(\textcolor{#228B22}{b_2}) = \lambda_2 \textcolor{#228B22}{b_2}f(b2)=λ2b2,


    then, F F FFF can be rewritten as F B F B F_(B)F_{B}FB, where
    F B = [ λ 1 0 0 λ 2 ] F B = λ 1 0 0 λ 2 F_(B)=[[lambda_(1),0],[0,lambda_(2)]]F_{B} = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}FB=[λ100λ2]
    A nice diagonal matrix!


    Enter Eigenvectors

    Is there a special name for the vectors above b 1 b 1 b_(1)b_1b1 and b 2 b 2 b_(2)b_2b2 that magically let us rewrite a matrix as a diagonal? Yes! These vectors are the eigenvectors of f f fff. That's right - you derived eigenvectors all by yourself.

    You the real MVP.

    More formally, we define an eigenvector of f f fff as any non-zero vector v v vvv such that:
    f ( v ) = λ v f ( v ) = λ v f(v)=lambda vf(v) = \lambda vf(v)=λv
    or
    F v = λ v F v = λ v F*v=lambda vF \cdot v = \lambda vFv=λv

    The basis formed by the eigenvectors is known as the eigenbasis. Once we switch to using the eigenbasis, our original problem of finding f f f f f ( v ) f f f f f ( v ) f@f@f@f@f(v)f\circ f\circ f \circ f \circ f (v)fffff(v) becomes:
    F B F B F B F B F B v B F B F B F B F B F B v B F_(B)*F_(B)*F_(B)*F_(B)*F_(B)*v_(B)F_{B} \cdot F_{B} \cdot F_{B} \cdot F_{B} \cdot F_{B} \cdot v_{B}FBFBFBFBFBvB
    = [ λ 1 5 0 0 λ 2 5 ] B = λ 1 5 0 0 λ 2 5 B =[[lambda_(1)^(5),0],[0,lambda_(2)^(5)]]_(B)= {\begin{bmatrix} {\lambda_1}^5 & 0 \\ 0 & {\lambda_2}^5 \end{bmatrix}}_{B}=[λ1500λ25]B
    So. Much. Easier.

    An Example

    Well this has all been pretty theoretical with abstract vectors like b b bbb and v v vvv - let's make this concrete with real vectors and matrices to see it in action.
    Imagine we had the matrix F = [ 2 1 1 2 ] F = 2 1 1 2 F=[[2,1],[1,2]]F = \begin{bmatrix}2 & 1 \\ 1 & 2 \end{bmatrix}F=[2112]. Since the goal of this post is not learning how to find eigenvectors, I'm just going to give you the eigenvectors for this matrix. They are:
    b 1 = [ 1 1 ] b 1 = 1 1 b_(1)=[[1],[-1]]b_{1} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}b1=[11]
    b 2 = [ 1 1 ] b 2 = 1 1 b_(2)=[[1],[1]]b_{2} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}b2=[11]
    The eigenbasis is just B = { b 1 , b 2 } B = { b 1 , b 2 } B={b_(1),b_(2)}B = \{b_1, b_2\}B={b1,b2}.
    What is F B F B F_(B)F_{B}FB, the matrix F F FFF written in the eigenbasis B B BBB?
    Since F B = [ f ( b 1 ) B f ( b 2 ) B ] F B = f ( b 1 ) B f ( b 2 ) B F_(B)=[[f(b_(1))_(B),f(b_(2))_(B)]]F_B = \begin{bmatrix} f(\textcolor{blue}{b_1})_{B} & f(\textcolor{#228B22}{b_2})_{B} \end{bmatrix}FB=[f(b1)Bf(b2)B], we need to find :
    • f ( b 1 ) B f ( b 1 ) B f(b_(1))_(B)f(\textcolor{blue}{b_1})_{B}f(b1)B and f ( b 2 ) B f ( b 2 ) B f(b_(2))_(B)f(\textcolor{#228B22}{b_2})_{B}f(b2)B
    We'll break this down by first finding f ( b 1 ) f ( b 1 ) f(b_(1))f(\textcolor{blue}{b_1})f(b1) and f ( b 2 ) f ( b 2 ) f(b_(2))f(\textcolor{#228B22}{b_2})f(b2), and rewrite them in the notation of the eigenbasis B B BBB to get f ( b 1 ) B f ( b 1 ) B f(b_(1))_(B)f(\textcolor{blue}{b_1})_{B}f(b1)B and f ( b 2 ) B f ( b 2 ) B f(b_(2))_(B)f(\textcolor{#228B22}{b_2})_{B}f(b2)B.

    Finding f ( b 1 ) f ( b 1 ) f(b_(1))f(\textcolor{blue}{b_1})f(b1)

    f ( b 1 ) f ( b 1 ) f(b_(1))f(\textcolor{blue}{b_1})f(b1) is:
    f ( b 1 ) = F b 1 = [ 2 1 1 2 ] [ 1 1 ] f ( b 1 ) = F b 1 = 2 1 1 2 1 1 f(b_(1))=F*b_(1)=[[2,1],[1,2]]*[[1],[-1]]f(\textcolor{blue}{b_1}) = F\cdot \textcolor{blue}{b_1} = \begin{bmatrix}2 & 1 \\ 1 & 2\end{bmatrix} \cdot \begin{bmatrix} 1 \\ -1 \end{bmatrix}f(b1)=Fb1=[2112][11]
    f ( b 1 ) = [ 1 1 ] f ( b 1 ) = 1 1 f(b_(1))=[[1],[-1]]f(\textcolor{blue}{b_1}) = \begin{bmatrix} 1 \\ -1 \end{bmatrix}f(b1)=[11]

    Finding f ( b 2 ) f ( b 2 ) f(b_(2))f(\textcolor{#228B22}{b_2})f(b2)

    Similarly,
    f ( b 2 ) = F b 2 = [ 2 1 1 2 ] [ 1 1 ] f ( b 2 ) = F b 2 = 2 1 1 2 1 1 f(b_(2))=F*b_(2)=[[2,1],[1,2]]*[[1],[1]]f(\textcolor{#228B22}{b_2}) = F\cdot \textcolor{#228B22}{b_2} = \begin{bmatrix}2 & 1 \\ 1 & 2\end{bmatrix} \cdot \begin{bmatrix} 1 \\ 1 \end{bmatrix}f(b2)=Fb2=[2112][11]
    f ( b 2 ) = [ 3 3 ] f ( b 2 ) = 3 3 f(b_(2))=[[3],[3]]f(\textcolor{#228B22}{b_2}) = \begin{bmatrix} 3 \\ 3 \end{bmatrix}f(b2)=[33]

    Rewriting the vectors in the basis B B BBB

    We've now found f ( b 1 ) f ( b 1 ) f(b_(1))f(b_1)f(b1) and f ( b 2 ) f ( b 2 ) f(b_(2))f(b_2)f(b2). We need to rewrite these vectors in the notation for our new basis B B BBB.
    What's f ( b 1 ) B f ( b 1 ) B f(b_(1))_(B)f(b_1)_{B}f(b1)B?
    f ( b 1 ) = [ 1 1 ] = 1 [ 1 1 ] + 0 [ 1 1 ] = 1 b 1 + 0 b 2 f ( b 1 ) = 1 1 = 1 1 1 + 0 1 1 = 1 b 1 + 0 b 2 f(b_(1))=[[1],[-1]]=1*[[1],[-1]]+0*[[1],[1]]=1*b_(1)+0*b_(2)f(b_1) = \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \textcolor{blue}{1} \cdot \begin{bmatrix} 1 \\ -1 \end{bmatrix} + \textcolor{#228B22}{0} \cdot \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \textcolor{blue}{1} \cdot b_1 + \textcolor{#228B22}{0} \cdot b_2f(b1)=[11]=1[11]+0[11]=1b1+0b2
    f ( b 1 ) B = [ 1 0 ] f ( b 1 ) B = 1 0 f(b_(1))_(B)=[[1],[0]]f(b_1)_{B} = \begin{bmatrix} \textcolor{blue}{1} \\ \textcolor{#228B22}{0} \end{bmatrix}f(b1)B=[10]
    Similarly,
    f ( b 2 ) = [ 3 3 ] = 0 [ 1 1 ] + 3 [ 1 1 ] = 0 b 1 + 3 b 2 f ( b 2 ) = 3 3 = 0 1 1 + 3 1 1 = 0 b 1 + 3 b 2 f(b_(2))=[[3],[3]]=0*[[1],[-1]]+3*[[1],[1]]=0*b_(1)+3*b_(2)f(b_2) = \begin{bmatrix} 3 \\ 3 \end{bmatrix} = \textcolor{blue}{0} \cdot \begin{bmatrix} 1 \\ -1 \end{bmatrix} + \textcolor{#228B22}{3} \cdot \begin{bmatrix}1 \\ 1 \end{bmatrix} = \textcolor{blue}{0} \cdot b_1 + \textcolor{#228B22}{3} \cdot b_2f(b2)=[33]=0[11]+3[11]=0b1+3b2
    f ( b 2 ) B = [ 0 3 ] f ( b 2 ) B = 0 3 f(b_(2))_(B)=[[0],[3]]f(b_2)_{B} = \begin{bmatrix} \textcolor{blue}{0} \\ \textcolor{#228B22}{3} \end{bmatrix}f(b2)B=[03]

    Putting this all together,

    F B = [ f ( b 1 ) B f ( b 2 ) B ] F B = f ( b 1 ) B f ( b 2 ) B F_(B)=[[f(b_(1))_(B),f(b_(2))_(B)]]F_B = \begin{bmatrix} f(\textcolor{blue}{b_1})_{B} & f(\textcolor{#228B22}{b_2})_{B} \end{bmatrix}FB=[f(b1)Bf(b2)B]
    F B = [ 1 0 0 3 ] F B = 1 0 0 3 F_(B)=[[1,0],[0,3]]F_B = \begin{bmatrix} 1 & 0 \\ 0 & 3 \end{bmatrix}FB=[1003]
    So we get the nice diagonal we wanted!

    Geometric Interpretation of Eigenvectors

    Eigenvectors also have extremely interesting geometric properties worth understanding. To see this, let's go back to the definition for an eiegenvector of a linear map f f fff and its matrix F F FFF.
    An eigenvector, is a vector v v vvv such that:
    F v = λ v F v = λ v F*v=lambda vF \cdot v = \lambda vFv=λv
    How are λ v λ v lambda v\lambda vλv and v v vvv related? λ v λ v lambda v\lambda vλv is just a scaling of v v vvv in the same direction - it can't be rotated in any way.

    Notice how λ v λ v lambda v\lambda vλv is in the same direction as v v vvv. Image Source: Wikipedia.

    In this sense, the eigenvectors of a linear map $f$ show us the axes along which the map simply scales or stretches its inputs.

    The single best visualization I've seen of this is by 3Blue1Brown who has a fantastic youtube channel on visualizing math in general.
    I'm embedding his video on eigenvectors and their visualizations below as it is the best geometric intuition out there:

    Source: 3Blue1Brown


    Google and Eigenvectors

    Like we saw at the beginning of this post, eigenvectors are not just an abstract concept used by eccentric mathematicians in dark rooms - they underpin some of the most useful technology in our lives including Google Search. For the brave, here's Larry Page and Sergey's original paper on PageRank, the algorithm that makes it possible for us to type in a few letters on a search box and instantly find every relevant website on the internet.
    In the next post pagerank, we're going to actually dig through this paper and see how eigenvectors are applied in Google search!
    Stay tuned.

    Appendix

    Proof that [ 1 0 ] 1 0 [[1],[0]]\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 1 1 ] 1 1 [[1],[1]]\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}[11] span R 2 R 2 R^(2)R^2R2:
    1. We know already that [ 1 0 ] 1 0 [[1],[0]]{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 0 1 ] 0 1 [[0],[1]]{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01] can be used to reach every coordinate.
    2. We can create [ 0 1 ] 0 1 [[0],[1]]{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01] by computing:
      • [ 1 1 ] [ 1 0 ] = [ 0 1 ] 1 1 1 0 = 0 1 [[1],[1]]-[[1],[0]]=[[0],[1]]\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}} - \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} = {\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[11][10]=[01]
    3. Thus we can combine our vectors to obtain both [ 1 0 ] 1 0 [[1],[0]]{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}[10] and [ 0 1 ] 0 1 [[0],[1]]{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01]. By point 1, this means every vector in R 2 R 2 R^(2)R^2R2 is reachable by combining [ 0 1 ] 0 1 [[0],[1]]\textcolor{blue}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}[01] and [ 1 1 ] 1 1 [[1],[1]]\textcolor{#228B22}{\begin{bmatrix} 1 \\ 1 \end{bmatrix}}[11].